On applying formal grammar and languages, and deduction to information retrieval modelling
نویسنده
چکیده
The paper applies formal methods (deduction, grammar) to some aspects of information retrieval. A formal definition of information retrieval is given as establishing a measure of a relation between documents and a user model. The user model consists of the query and additional information on user, which is partly stored and partly deduced based on the stored data and a general rule base. The retrieval system should then answer the user model rather than the query. Further, it is shown that the set of documents represented in a normal form is recursive, which makes it possible to design an additional validation processor in order to check the format of new documents being uploaded. It is also shown that the formal correctness of the query does not necessarily imply its positive answerability. 1 User model and deduction 1.1 Information need Information Retrieval (IR) is concerned with the organisation, storage, retrieval, and evaluation of information relevant to a user’s information need. The main components of IR are as follows: user; information need; request; query; information stored in computer(s); appropriate computer programs. The user has an information need (i.e., wants to find out something, is looking for information on something; e.g., articles published on a certain subject, books written by an author, banks offering online banking services, travel agencies with last minute offers, etc.). The information need is formulated in a request for information, in natural language. The request is then expressed in the form of a query, in a form that is required by the computer programs (e.g., according to the syntax of a query language). These programs retrieve information in response to a query, e.g., they return database records, journal articles, WWW (World Wide Web) pages, etc.. This is the reason why, mainly in practice, IR can also be viewed as a system, and the term Information Retrieval System (IRS) is also used. If a user, say U, is interested in journal articles and/or authors on, e.g., ‘mathematical methods and techniques used in information retrieval’ then this is the user’s information need; let us denote it by IN. The information need IN is re– formulated in a form accepted by the search processor (engine); it thus becomes a query, say Q. 1.2 Information retrieval without hidden information Information is stored in computer databases. More generally, information is stored in entities which may be generically referred to as objects O, e.g., abstracts, articles, images, sounds, etc.; these are traditionally called documents. The objects should be suitably represented, in such a way that they can be subjected to appropriate algorithms and computer programs. The same holds for queries, too. The overall aim of an IR system is to or try to return information which is relevant to the user, i.e., information that is useful, meaningful. Thus IR may be re–formulated symbolically or formally as a 4–tuple yielding retrieved objects as follows: IR = (U, IN, Q, O ) → R 1.3. Implicit (hidden) information The information need IN is more than its expression as a query Q: IN comprises query Q plus additional information about user U. This additional information is specific to the user: spoken languages, fields of interest, preferred journals, specialisation, profession, most frequently used queries, etc.. The importance of additional information consists in that it is one factor in the judgment of relevance, when judging whether a retrieved object is relevant or not. For example, the same search term PROGRAM has different meanings for a computer programmer (meaning a text written in the C programming language and solving a differential equation) and for a conference organiser (meaning a structure and sequence of scientific and social events during the conference). The additional information is obvious for the user (he/she implicitly assumes it) but not for the computer. Thus we may term this additional information as being an implicit information I specific to the user U, and we may write: IN = (Q, I) Thus the meaning of the concept of IR can be re–formulated as being concerned with finding an appropriate relevance relationship, say R, between objects O and information need IN; symbolically: IR = R(O, IN) = R(O, (Q, I)) 1.4 Information retrieval with hidden information In order for an IR system to find such a relation R it should be made possible to take into account the implicit information I as well, and ideally the information which can be deduced (inferred) from I to obtain as complete a picture of user U as possible. Thus finding an appropriate relation R would mean obtaining (deriving, inferring) those objects O which match the meaning of the query Q and satisfy the implicit information I. With these IR becomes: IR = R(O, (Q, 〈I, |→〉) where 〈I, |→〉 means I plus information derivable (e.g., in some language or logic) or inferred or deduced from I. Of course, the relation R is established with some (un)certainty m; thus: IR = m[R(O, (Q, 〈I, |→〉))] There is a rich literature on user modelling. Based on [4], [5], we give a small example to render a possible meaning of 〈I, |→〉. The user's implicit information I may be (stored permanently, and updated as necessary). Consider, for example, the following user: Identifier: U100 Name: UserOneHundred Languages spoken: Hungarian, Age: 24 Computer skills: payroll software Profession: Secretary A rule base to deduce additional information from I may be: IF (user has retrieval experience) THEN (user is skilled AND likes shortcuts AND familiar with Boolean expressions) IF (user has no OR less retrieval experience) THEN (user is a beginner AND prefers menus) IF (user is a child) THEN (user likes more colours and few text) IF (user does not speak English) THEN (do not return hits in the English language)
منابع مشابه
A short introduction to two approaches in formal verification of security protocols: model checking and theorem proving
In this paper, we shortly review two formal approaches in verification of security protocols; model checking and theorem proving. Model checking is based on studying the behavior of protocols via generating all different behaviors of a protocol and checking whether the desired goals are satisfied in all instances or not. We investigate Scyther operational semantics as n example of this...
متن کاملGAME OF COORDINATION FOR BACTERIAL PATTERN FORMATION: A FINITE AUTOMATA MODELLING
In this paper, we use game theory to describe the emergence of self-organization and consequent pattern formation through communicative cooperation in Bacillus subtilis colonies. The emergence of cooperative regime is modelled as an n-player Assurance game, with the bacterial colonies as individual players. The game is played iteratively through cooperative communication, and mediated by exchan...
متن کاملInformation Structure in Topological Dependency Grammar
Topological Dependency Grammar (TDG) is a lexicalized dependency grammar formalism, able to model languages with a relatively free word order. In such languages, word order variation often has an important function: the realization of information structure. The paper discusses how to integrate information structure into TDG, and presents a constraint-based approach to modelling information stru...
متن کاملFormal modelling techniques in human - computer interaction *
This paper is a theoretical contribution, elaborating the concept of models as used in Cognitive Ergonomics. A number of formal modelling techniques in human-computer interaction will be reviewed and discussed. The analysis focusses on different related concepts of formal modelling techniques in human-computer interaction. The label ‘model’ is used in various ways to represent the knowledge use...
متن کاملStatistical Modelling of Highly Inflective Languages
A language model is a description of language. Although grammar has been the prevalent tool in modelling language for a long time, interest has recently shifted towards statistical modelling. This chapter refers to speech recognition experiments, although statistical language models are applicable over a wide-range of applications: machine translation, information retrieval, etc. Statistical mo...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2001